Skip to content

DBAAS-7956: Percona MySQL metrics — full catalog and aggregation#354

Merged
atelpis merged 11 commits intomasterfrom
rshah/percona-mysql-whitelist
Apr 21, 2026
Merged

DBAAS-7956: Percona MySQL metrics — full catalog and aggregation#354
atelpis merged 11 commits intomasterfrom
rshah/percona-mysql-whitelist

Conversation

@rshahdo
Copy link
Copy Markdown
Collaborator

@rshahdo rshahdo commented Apr 10, 2026

Summary

Adds full metric support for the Advanced MySQL (Percona) offering: expanded whitelist, cardinality-safe aggregation rules, and resilient delivery under DOKS rate-limit collisions.

Changes

whitelist.go — Full Percona MySQL metric catalog

  • MySQL global status (uptime, queries, threads, connections, slow queries, commands)
  • InnoDB status (buffer pool pages, row ops, data reads/writes, deadlocks, log waits, doublewrite)
  • InnoDB variables (buffer pool size, max connections, log file size, thread concurrency)
  • Information schema InnoDB metrics (transaction history, purge stats, adaptive hash index)
  • Performance schema file events (IO latencies)
  • Group Replication member status, replication lag, transaction certifier, and transaction flow

aggregation.go — High-cardinality label collapse

  • Adds aggregation rules for 5 metrics that fan out by label:
    • mysql_global_status_commands_total → strips command (~50-100 values)
    • mysql_global_status_innodb_row_ops_total → strips operation (4 values)
    • mysql_perf_schema_file_events_* (3 metrics) → strips event_name (dozens of values)
  • Follows the same pattern used by PostgreSQL (table_name), Kafka (topic, partition), and OpenSearch (node_host, cluster_name)

Metric catalog summary

Category Count Source
MySQL global status 13 --collect.global_status
InnoDB status 19 --collect.global_status
InnoDB variables 4 --collect.global_variables
Info schema InnoDB 5 --collect.info_schema.innodb_metrics
Perf schema file events 3 --collect.perf_schema.file_events
Replication / transactions 10 --collect.perf_schema.replication_group_member_stats
Total 54

@rshahdo rshahdo requested a review from a team as a code owner April 10, 2026 06:29
rshahdo added 5 commits April 14, 2026 22:12
On DOKS nodes the dbaas do-agent sidecar shares a per-droplet
rate-limit bucket with the system do-node-agent DaemonSet. A single
retry was insufficient for double-collisions. This adds 3 retries
(10s, 15s, 20s backoff) and resets lastFlushAttempt after each to
prevent the internal rate limiter from blocking subsequent attempts.

Made-with: Cursor
…re gap

After 429 retries the agent has already spent ~45s backing off. The
additional full-cycle sleep (120s) created a ~5 min gap before the
next successful push. Removing it shrinks recovery to ~2.5 min
(one missed cycle), which stays within the 5m PromQL rate window.

Made-with: Cursor
Replace the initial 10-metric POC whitelist with the complete set
covering global status, InnoDB status/variables, info_schema InnoDB
metrics, perf_schema file events, Group Replication member status,
replication lag, and transaction certifier/flow metrics.

Made-with: Cursor
…metrics

Strip command/operation/event_name labels from commands_total,
innodb_row_ops_total, and perf_schema file_events to collapse
per-value series into summed totals, preventing cardinality from
exceeding Sonar's batch size limit.

Made-with: Cursor
@rshahdo rshahdo changed the title DBAAS-7956: Adding new set of whitelist for percona mysql DBAAS-7956: Percona MySQL metrics — full catalog, 429 retry, and aggregation Apr 14, 2026
Comment thread pkg/clients/tsclient/client.go Outdated
@atelpis atelpis changed the title DBAAS-7956: Percona MySQL metrics — full catalog, 429 retry, and aggregation DBAAS-7956: Percona MySQL metrics — full catalog and aggregation Apr 21, 2026
@atelpis atelpis merged commit 06d84e0 into master Apr 21, 2026
2 checks passed
@atelpis atelpis deleted the rshah/percona-mysql-whitelist branch April 21, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants